Skip to content

[Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance#4

Closed
Copilot wants to merge 2 commits into
mainfrom
copilot/remove-pr-from-vllm-ascend
Closed

[Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance#4
Copilot wants to merge 2 commits into
mainfrom
copilot/remove-pr-from-vllm-ascend

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 11, 2026

当前问题是提交流程指引不够明确,容易把变更误提到 vllm-project/vllm-ascend 社区仓 PR。此变更将流程统一为“私库/私有 fork 提 PR”,并补充误提社区仓后的处理指引。

  • 问题范围收敛(PR 归属)

    • 将审查清单中的要求更新为:PR 必须开在私有仓库(或私有 fork),而不是社区仓 vllm-project/vllm-ascend
  • 提交流程修正(Quick Start)

    • 第 10 步改为“推送到私库/私有 fork”。
    • 示例远程地址从 vllm-ascend 调整为 vllm-ascend-dev,避免指向社区主仓语义。
  • 误操作恢复指引

    • 新增明确步骤:若误在 vllm-project/vllm-ascend 开 PR,应关闭该 PR,并在私库中重新发起。

示例更新片段:

git remote add myfork https://github.com/YOUR_USERNAME/vllm-ascend-dev.git
git push -u myfork your-branch-name

并在文档中补充:

If you accidentally opened a PR in `vllm-project/vllm-ascend`,
close that PR and reopen it in your private repository.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: lHrHenry233 <99803155+lHrHenry233@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove commit from vllm ascend community PR [Doc][Misc] Clarify private-repo PR workflow and community PR rollback guidance Mar 11, 2026
@lHrHenry233 lHrHenry233 deleted the copilot/remove-pr-from-vllm-ascend branch April 1, 2026 07:20
lHrHenry233 pushed a commit that referenced this pull request Apr 10, 2026
…(v3.1)

- Port upstream _causal_conv1d_fwd_kernel as NPU Triton kernel
  - Handles initial/final/intermediate conv state in-kernel
  - Supports APC block boundary state writes
  - NPU adaptations: removed .cache_modifier, kept debug_barrier
- Rewrite causal_conv1d_fn to dispatch to new Triton kernel
- Rewrite gdn.py conv1d path: split decode/prefill like upstream
  - Decode: causal_conv1d_update_npu with block params
  - Prefill: causal_conv1d_fn with APC params (new kernel)
- Fix SSM #6: _build_initial_state only zeros prefill sequences
- Fix SSM #7: _write_final_states adds slot >= 0 validation
- Fix SSM #8: _scatter_intermediate_states adds unaligned offset
- Update all 36 UTs to pass with new num_computed_tokens_all field

Alignment status vs upstream #26807:
  #1 conv1d prefill kernel:     FIXED (kernel ported)
  #3 causal_conv1d_fn params:   FIXED (rewritten)
  #4 intermediate conv state:   FIXED (kernel internal)
  #6 SSM zeroing scope:         FIXED
  #7 _write_final_states guard: FIXED
  #8 SSM scatter alignment:     FIXED
  #9 causal_conv1d_fn signature: FIXED
  #2 decode pre-copy:           KEEP (NPU needs it)
  #5 SSM decode index:          OK (correct approach)
  #10 conv layout hardcoded:    DEFERRED

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lHrHenry233 added a commit that referenced this pull request May 7, 2026
Five logic / hot-path issues in vllm_ascend/ops/triton/mamba/causal_conv1d.py:

1. (#1) Fallback non-APC `causal_conv1d_fn` ignored `has_initial_state[i]` and
   always seeded the convolution from `conv_states[cache_indices[i]]`. First
   prefill iterations (no prior context) silently used stale cache state.
   Now passes `initial_states=None` when `has_initial_state[i]` is False.

2. (#2) `forward_context` lookup at the top of `causal_conv1d_fn` ran on
   every per-layer APC prefill call, but `num_decodes` is only used by the
   non-APC fallback path. Moved the lookup inside the fallback branch so the
   APC hot path does not pay it 28x per prefill on Qwen3.5.

3. (#3) `NULL_BLOCK_ID` fell back to `PAD_SLOT_ID` (-1) when upstream did
   not export it. vLLM v1 `BlockPool` reserves block_id=0 as the null block
   and vLLM-Ascend `block_table.fill_(0)` matches that semantics, while
   PAD_SLOT_ID=-1 only applies to 1D slot/state-index padding. The fallback
   should be 0 so the `null_block_id != pad_slot_id` normalization actually
   masks 2D block-table placeholders. Documented the distinction at the
   constant definition site.

4. (#4) `_normalize_apc_input_layout` did four heuristic shape comparisons
   to auto-detect dim-major / weight-transposed inputs. This is ambiguous
   when dim==width or dim==tokens, and shifts layout-fix cost into the
   prefill hot path. Replaced with an explicit contract: x is [tokens, dim],
   weight is [dim, width]; fail loud otherwise. Removed the now-unused
   `input_is_dim_major` plumbing and the trailing `out.transpose(0,1).
   contiguous()` in `_causal_conv1d_fwd_npu`.

5. (#5) `_uses_apc_prefill_path` selected APC when ANY of five tensors was
   non-None, masking caller bugs (e.g. block_idx_* provided without a 2D
   block table). Now selects on cache_indices.dim()==2 and raises
   RuntimeError if APC auxiliary tensors are passed without a 2D block
   table, matching the documented dispatcher contract.

Also dropped the redundant `conv_states.shape[-2] != dim and shape[-1] == dim:
transpose` probe in the dispatcher; the caller already passes the standard
`[num_cache_lines, dim, state_len]` layout, and silent re-permutation hides
upstream contract violations.

No behaviour change on the production caller path (gdn._run_all_mode_non_spec_conv1d),
verified by tracing layout invariants. The fixes harden the dispatcher
against future caller refactors and remove dead/wrong code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: lHrHenry233 <2381623149@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants